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Abstract 


Biodiversity monitoring using audio recordings is achievable at a truly global 
scale via large-scale deployment of inexpensive, unattended recording stations 
or by large-scale crowdsourcing using recording and species recognition on mo¬ 
bile devices. The ability, however, to reliably identify vocalising animal species 
is limited by the fact that acoustic signatures of interest in such recordings are 
typically embedded in a diverse and complex acoustic background. To avoid the 
problems associated with modelling such backgrounds, we build generative mod¬ 
els of bird sounds and use the concept of novelty detection to screen recordings to 
detect sections of data which are likely bird vocalisations. We present detection 
results against various acoustic environments and different signal-to-noise ratios. 
We discuss the issues related to selecting the cost function and setting detection 
thresholds in such algorithms. Our methods are designed to be scalable and au¬ 
tomatically applicable to arbitrary selections of species depending on the specific 
geographic region and time period of deployment. 


1 Introduction 

The present day availability of cheap recording devices, widespread use of mobile devices and avail¬ 
ability of network connectivity make possible the recording, storage and online accessing of envi¬ 
ronmental sound at an unprecedented level. In order for this large collection of data to be useful 
for the purposes of biodiversity monitoring, methods are needed for the automatic identification of 
vocalising animals. In the case of bird sound identification that is considered here, most previous 
work is focused on the problem of classifying a given sound interval that is known to predominantly 
contain bird sound among a predetermined set of classes (usually corresponding to different bird 
species). However, such work is not directly applicable to recordings obtained in the way described 
above. This is because such recordings are typically sparse in bird sound but can also contain any 
number of other sources of sound. The output of a classifier trained and optimised to discriminate 
between different classes of bird sound can be problematic to interpret and evaluate for such non¬ 
bird sound inputs. At the same time, designing, training and testing in a statistically balanced way, 
classifiers that include a model for any type of possible acoustic background is not a practical option. 

An alternative, hierarchical, approach would be to include a screening stage where bird sound is 
first detected against other sources and then classified to species classes. In this work we take the 
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approach of building a probabilistic generative normality model for each one of the classes of interest 
(different bird species). Our objective is for each one of these individual models to be adequately 
discriminative against non-bird sound; discrimination between different classes of bird sound is 
not the primary target. Designing this first stage in a principled probabilistic manner facilitates the 
effective and versatile integration of its inference and decision output into a subsequent classification 
stage. That classification stage can be flexible in terms of the bird species (classes) considered, for 
example by using a ‘lazy’ classifier where the classes to be considered are chosen after the training 
stage. Further to that, building per-species normality models can be directly used in human-machine 
cooperation schemes where, for instance, very large collections of recorded data are drastically 
condensed to a probabilistically ordered list of inferred occurrences of any chosen class of interest 
and presented to a human expert for validation. 

Previous work on birdsong detection (|[I]l2][3l|4l) rely on the existence of a cleanly annotated dataset 
for the training of the automatic recognition system; in many cases segmented and annotated down 
to the syllable level. The creation of such cleanly annotated training data is laborious and cannot 
feasibly be extended to cover the total number of (approximately 10000) bird species. In addition 
to that, results obtained on such fragmented or proprietary training and testing data are not easy to 
reproduce independently and evaluate in comparison with other methods. On the other hand, au¬ 
dio recordings of virtually world-wide bird species distribution, albeit with less detailed annotation 
metadata, are currently openly available on crowdsourced platforms. A prime example of that is 
the xeno-canto website (www. xeno-canto . org) which currently provides in excess of 200000 
recordings of more than 9000 avian species. We make use of the less detailed annotation that is pro¬ 
vided by individual users in that collection to obtain training data for the determination of generative 
models for 15 species. We subsequently test the effectiveness of these models in determining the 
location of bird sound that we have added at random positions inside background sound recordings. 

The objective of the work presented in this paper is to examine the effectiveness and performance 
limitations of a baseline audio feature extraction and normality model estimation method and to 
identify directions for the improvement of its performance. The methods we present are completely 
free of any manual data preprocessing and hence (in conjunction with the aforementioned source of 
available data) directly scalable to larger numbers of bird species. 

2 Methods 

The data we use for testing our detection method consist of environmental sounds from the 
‘IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events competition 
database tB Separate rounds of experiments were run with randomly selected recordings from the 
‘Park’ and the ‘Open air market’ categories. The former of these categories is chosen as representa¬ 
tive of an urban natural environment; it contains recordings with larger segments of silence and it is 
expected to be less confounding for the detection task. Recordings from the latter category are very 
dense in human speech and busy urban environment sounds and are expected to be more challeng¬ 
ing for the detection task. The clean bird sound data are recordings of various time lengths (ranging 
from approximately 0.5s to 10s) obtained form the ‘Reference Animal Vocalisations’ database of 
the ‘Animal Sound Archive of the Museum fr Naturkunde Berlin^ We choose 15 bird species for 
which that database has more than 100 recording samples (however we only use recordings that are 
‘Open Access’). In the results presented in the next section, species numbers are as the alphabetical 
ordering in Table [1] 

Eor the training data we use recordings from the xeno-canto database labelled with the same 15 
species chosen for testing. Eor each species we select only recordings labelled with an ‘A’ (highest) 
quality rating and that are annotated as not having other bird species in the background. We compute 
spectrograms for each recording (20ms frame length, 50% overlap, EET length equal to the closest 
power of two that gives at least 93Hz frequency bin spacing, for the sampling rate of the specific 
recording). We only keep frequency bins in the range of 1-lOkHz. This frequency range retains 
the vocalisation frequency content of most avian species while discarding low frequency wind noise 
and recorder self noise and it is typical in the related literature. The recordings in xeno-canto typi¬ 
cally contain long intervals of other ambient or anthropogenic sounds, but in recordings with an ‘A’ 

’http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge 

^http://WWW.animalsoundarchive.org/RefSys/Preview.php 
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Table 1; List of species, features and feature vectors used for the design of the bird sound normality 
models in different experimental cases. 



Species 


Features 

1 

Emberiza hortulana 

1 

Mean 

2 

Emberiza schoeniclus 

2 

Standard Deviation 

3 

Fringilla coelebs 

3 

Skewness 

4 

Luscinia luscinia 

4 

Kurtosis 

5 

Luscinia megarhynchos 

5 

Mode 

6 

Parus major 

6 

SFM 

7 

Periparus ater 


Feature sets examined 

8 

Phoenicurus phoenicurus 


[1] 

9 

Phylloscopus collybita 


[5] 

10 

Phylloscopus ibericus 


[12] 

11 

Phylloscopus trochilus 


[5 2] 

12 

Rallus aquaticus 


[12 6] 

13 

Sylvia atricapilla 


[5 2 6] 

14 

Turdus merula 


[1 2 3 4] 

15 

Turdus philomelos 


[1 2 3 45 6] 


quality rating these non-bird intervals are predominantly silence or low frequency wind or recorder 
self-noise. In each recording’s spectrogram we discard frames with total power less than the 90th 
percentile expecting that the selected 10% more energetic frames come nearly exclusively from bird 
sound time intervals (we selectively verihed that by listening). Each of the selected spectrogram 
frames is normalised to unity sum (hence having the form of a probability mass function, pmf) and 
6 spectral statistics listed in Table [T] are extracted as features. Quadratic interpolation between two 
frequency bins surrounding the maximum is applied for the determination of the frequency position 
of maximum frequency (mode of the pmf). The Spectral Flatness Measure is computed with the 
method described in 0. Each element in the 6-element feature vector is standardised to zero mean 
and unit standard deviation via a simple (invertible) linear transform. 

For each species we randomly select 6000 feature vectors (given the spectrogram computation over¬ 
lap and the 90th percentile selection, this corresponds to 60-120sec of ‘bird only’ frames or equiv¬ 
alently to 10-20min actual recording duration) and we use it to train a full covariance Gaussian 
Mixture Model (GMM) using Expectation Maximisation (EM) (for a detailed description of GMM 
fitting using EM see @ ch.9). GMMs are htted to different selections of feature sets ranging from 
univariate up to the full 6 dimensional feature vector as listed in Table [T] In each case the EM al¬ 
gorithm is run 10 times for different random initialisations and the GMM converged to the highest 
log-likelihood is selected. We set the number of components to vary from 1 to 15. The Minimum 
Description Length (MDL, 0) model selection criterion is computed for each one of the 1- to 
15-component GMMs. 

The same feature extraction process is applied to the test signals, but keeping all the frames (no 
10% power thresholding). The computed test features are normalised with the mean and standard 
deviation values computed from the training feature set. The value of the probability density function 
(pdf) for each test frame feature vector is obtained from the corresponding species’ GMM and per- 
frame pdf values are averaged over consecutive frames spanning a 500ms time interval. Averaged 
GMM pdf values are taken at 100ms steps. The Receiver Operating Characteristic curve (ROC 
curve) is computed based on whether the 500ms interval (or the larger part of it) is within the limits 
of the randomly chosen placement of the bird signal. For a binary classihcation method that assigns a 
probability (or probabilistically interpretable score) of class membership to each given test instance, 
the ROC Curve traces the True Positive Rate versus False Positive Rate points obtained when the 
threshold of class separation moves continuously from minus inhnity to plus inhnity, thus effectively 
evaluating the performance of the classiher in correctly sorting the given instances (in our case 
500ms intervals) with respect to membership in the two classes (in our cases time intervals known 
to containing bird sound compared to time intervals of background sound). The corresponding 
Area Under the Curve (AUC) metric is one effective way to evaluate such classiher discrimination 
performance in cases of imbalanced proportion of positive to negative examples in the test set (for 
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Figure 1; Example; Audio signal (grey line) with marked bird limits (black dashed line) and value 
of the corresponding species GMM pdf averaged over a 500ms interval. 


the related details see ||8]). The results of the next section are median values of the AUC obtained 
over 50 repetitions of test signals as described above. Figure [T] provides an example of such a test 
sample. 

3 Results 

Figure |2] summarily presents the per-species detection performance (median and interquartile range 
of the AUC over 50 tests) obtained when a bird sound normality model htted to each individual 
species’ training data is used to detect vocalisations of that species embedded at random positions 
(different in each individual test) in a non-bird background. The results obtained with GMM density 
estimation models for each of the 8 feature combinations listed in Table [T] are plotted in separate 
subplots. Three different cases of background are considered, namely recordings from the ‘Park’ 
category with the SNR set to H-3dB and from the ‘Open air market’ category with the SNR set to 
H-3dB and -3dB. We note that each set of results for the three different backgrounds was obtained 
with a different draw of 6000 frames for the GMM modelling.The results plotted in Figure are 
those obtained with the GMM with a number of components as determined by the minimum value 
of the MDL criterion over the range of 1 to 15 components. The same results are also summarised 
in a tabular form in Table|2] 

As can be seen in Figure|2] the best results overall are obtained by using only one feature, namely the 
mode, with a univariate GMM. In that case the median AUC obtained by the MDL-selected GMM 
is higher than 0.9 for 11 of the 15 species in ‘Park’ H-3dB SNR background case and for 10 of the 15 
species in the other background cases. The lowest median AUC is 0.70, 0.64 and 0.56 for the three 
background cases (all for the Luscinia luscinia species). The discriminative power of this particular 
feature is not surprising considering previously published classihcation results that rely on frequency 
peak tracking based features (ll9][T0l). On the other hand, overall, the higher dimension models do 
not achieve to enrich the normality model (bird sound) so as to more successfully discriminate 
against background. For particular species, (Turdus merula and Turdus philomelos) the inclusion 
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Figure 2: Median (dotted circle) and interquartile range (25th to 75th percentile - vertical bar) of the 
AUC over 50 tests. Three cases of background (‘Park’ category with +3dB SNR, ‘Open air market’ 
category with +3dB and -3dB SNR) are plotted with a horizontal shift and with black, grey and light 
grey colours respectively. 


of the SFM feature in the model appears to achieve consistently better results. Even though the 
detection method investigated here can, in principle, be applied with normality models based on 
different selections of features (possibly individually optimised for each different species), a more 
detailed examination of such a possibility is required to support the practical merit of such an option. 
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Table 2: Median of the AUC over 50 tests. Columns correspond to feature vectors and rows to 
species, both in the same order as listed in Table[T] The hrst line of each cell is the median AUC and 
the second line is the number of components as chosen by the minimum of the MDL criterion for 
the three cases of background considered. 


{). 83 -(). 84-().78 

3 - 4-3 

0 . 98 - 0 . 96 - 0.93 

6 - 9-6 

0 . 97 - 0 . 95 - 0.91 

5 - 6-6 

0 . 98 - 0 . 93 - 0.94 

15 - 15-8 

0 . 98 - 0 . 95 - 0.94 

10 - 8-11 

0 . 98 - 0 . 97 - 0.88 

15 - 12-9 

0 . 95 - 0 . 90 - 0.86 

15 - 15-15 

0 . 93 - 0 . 93 - 0.83 

15 - 15-15 

0 . 98 - 0 . 94 - 0.90 

4 - 4-3 

0 . 99 - 0 . 99 - 0.98 

5 - 4-8 

0 . 99 - 0 . 99 - 0.96 

6 - 6-5 

1 . 00 - 0 . 98 - 0.96 

8 - 6-7 

1 . 00 - 0 . 99 - 0.93 

9 - 10-7 

0 . 97 - 0 . 96 - 0.93 

9 - 8-10 

0 . 95 - 0 . 80 - 0.70 

15 - 15-15 

0 . 75 - 0 . 85 - 0.63 

15 - 15-15 

0 . 95 - 0 . 86 - 0.82 

3 - 3-2 

0 . 95 - 0 . 93 - 0.93 

5 - 4-4 

0 . 97 - 0 . 95 - 0.96 

5 - 5-5 

0 . 94 - 0 . 95 - 0.91 

6 - 8-5 

0 . 95 - 0 . 96 - 0.90 

8 - 9-7 

0 . 95 - 0 . 96 - 0.91 

8 - 10-11 

0 . 90 - 0 . 81 - 0.69 

15 - 15-15 

0 . 88 - 0 . 86 - 0.74 

15 - 15-15 

0 . 28 - 0 . 33 - 0.37 

3 - 3-3 

0 . 70 - 0 . 64 - 0.57 

5 - 5-6 

0 . 77 - 0 . 60 - 0.56 

6 - 6-7 

0 . 80 - 0 . 67 - 0.62 

9 - 7-7 

0 . 72 - 0 . 65 - 0.61 

10 - 12-11 

0 . 73 - 0 . 76 - 0.65 

11 - 10-10 

0 . 63 - 0 . 61 - 0.61 

15 - 15-15 

0 . 70 - 0 . 68 - 0.65 

15 - 15-15 

0 . 30 - 0 . 21 - 0.32 

3 - 3-3 

0 . 82 - 0 . 80 - 0.79 

5 - 5-7 

0 . 84 - 0 . 75 - 0.75 

6 - 5-4 

0 . 85 - 0 . 76 - 0.73 

7 - 6-5 

0 . 87 - 0 . 84 - 0.79 

10 - 7-8 

0 . 84 - 0 . 79 - 0.78 

8 - 7-9 

0 . 80 - 0 . 80 - 0.75 

15 - 15-15 

0 . 69 - 0 . 65 - 0.66 

15 - 15-15 

0 . 95 - 0 . 92 - 0.85 

3 - 3-3 

0 . 93 - 0 . 91 - 0.92 

5 - 4-4 

0 . 94 - 0 . 94 - 0.90 

5 - 5-6 

0 . 91 - 0 . 92 - 0.83 

6 - 7-6 

0 . 93 - 0 . 87 - 0.86 

10 - 9-11 

0 . 91 - 0 . 88 - 0.85 

11 - 10-12 

0 . 91 - 0 . 91 - 0.83 

15 - 15-15 

0 . 88 - 0 . 87 - 0.68 

15 - 15-15 

0 . 98 - 0 . 96 - 0.93 

3 - 3-2 

0 . 94 - 0 . 95 - 0.95 

4 - 4-4 

0 . 97 - 0 . 95 - 0.85 

8 - 9-10 

0 . 97 - 0 . 94 - 0.84 

10 - 9-11 

0 . 90 - 0 . 92 - 0.84 

12 - 9-12 

0 . 91 - 0 . 96 - 0.86 

13 - 13-9 

0 . 64 - 0 . 65 - 0.53 

15 - 15-15 

0 . 76 - 0 . 67 - 0.48 

15 - 15-15 

0 . 96 - 0 . 88 - 0.82 

3 - 4-4 

0 . 98 - 0 . 97 - 0.97 

5 - 3-3 

0 . 99 - 0 . 98 - 0.95 

7 - 6-8 

0 . 99 - 0 . 97 - 0.96 

6 - 6-8 

0 . 99 - 0 . 99 - 0.97 

9 - 7-8 

0 . 96 - 0 . 97 - 0.95 

9 - 9-8 

0 . 98 - 0 . 98 - 0.87 

15 - 15-15 

0 . 97 - 0 . 87 - 0.86 

15 - 15-15 

0 . 97 - 0 . 96 - 0.88 

3 - 3-3 

0 . 96 - 0 . 96 - 0.94 

6 - 4-10 

0 . 98 - 0 . 96 - 0.94 

6 - 6-7 

0 . 97 - 0 . 96 - 0.92 

7 - 9-7 

0 . 99 - 0 . 98 - 0.93 

9 - 9-8 

0 . 94 - 0 . 96 - 0.92 

9 - 8-11 

0 . 97 - 0 . 89 - 0.71 

15 - 15-15 

0 . 92 - 0 . 92 - 0.78 

15 - 15-15 

0 . 99 - 0 . 96 - 0.95 

3 - 2-5 

0 . 95 - 0 . 95 - 0.92 

5 - 5-8 

0 . 92 - 0 . 90 - 0.88 

6 - 7-5 

0 . 85 - 0 . 84 - 0.74 

8 - 10-10 

0 . 93 - 0 . 86 - 0.69 

12 - 12-10 

0 . 81 - 0 . 77 - 0.67 

13 - 12-11 

0 . 59 - 0 . 47 - 0.65 

14 - 15-15 

0 . 48 - 0 . 54 - 0.50 

15 - 15-15 

0 . 95 - 0 . 86 - 0.85 

3 - 4-3 

0 . 95 - 0 . 93 - 0.93 

5 - 4-6 

0 . 93 - 0 . 94 - 0.89 

11 - 9-9 

0 . 91 - 0 . 91 - 0.85 

8 - 8-9 

0 . 94 - 0 . 96 - 0.94 

9 - 10-10 

0 . 91 - 0 . 92 - 0.86 

8 - 8-10 

0 . 81 - 0 . 81 - 0.67 

15 - 15-15 

0 . 87 - 0 . 83 - 0.65 

15 - 15-15 

0 . 66 - 0 . 51 - 0.55 

3 - 3-3 

0 . 90 - 0 . 88 - 0.89 

6 - 7-10 

0 . 80 - 0 . 77 - 0.79 

8 - 7-7 

0 . 86 - 0 . 85 - 0.82 

7 - 10-8 

0 . 59 - 0 . 68 - 0.69 

10 - 9-11 

0 . 77 - 0 . 79 - 0.76 

9 - 12-9 

0 . 53 - 0 . 35 - 0.50 

15 - 15-15 

0 . 39 - 0 . 51 - 0.56 

15 - 15-15 

0 . 80 - 0 . 71 - 0.65 

3 - 3-3 

0 . 96 - 0 . 93 - 0.90 

3 - 3-3 

0 . 97 - 0 . 96 - 0.93 

3 - 3-3 

0 . 95 - 0 . 94 - 0.89 

5 - 5-4 

0 . 97 - 0 . 97 - 0.94 

6 - 8-6 

0 . 95 - 0 . 96 - 0.94 

7 - 7-7 

0 . 94 - 0 . 95 - 0.85 

15 - 15-15 

0 . 81 - 0 . 92 - 0.79 

15 - 15-15 

0 , 41 - 0 . 44 - 0,38 

4 - 4-4 

0 . 79 - 0 . 77 - 0.66 

5 - 5-5 

0 . 81 - 0 . 60 - 0.63 

6 - 7-7 

0 . 88 - 0 . 76 - 0.69 

7 - 7-7 

0 . 96 - 0 . 92 - 0.87 

10 - 9-8 

0 . 91 - 0 . 88 - 0.82 

12 - 12-11 

0 . 74 - 0 . 79 - 0.70 

15 - 15-15 

0 . 74 - 0 . 78 - 0.73 

15 - 15-15 

0 . 48 - 0 . 48 - 0.48 

4 - 3-3 

0 . 88 - 0 . 80 - 0.78 

3 - 8-5 

0 . 94 - 0 . 86 - 0.79 

4 - 4-4 

0 . 84 - 0 . 85 - 0.76 

7 - 7-8 

0 . 93 - 0 . 95 - 0.91 

8 - 7-8 

0 . 84 - 0 . 86 - 0.75 

9 - 7-8 

0 . 83 - 0 . 85 - 0.81 

15 - 15-15 

0 . 85 - 0 . 75 - 0.70 

15 - 15-15 


Comparison of the results obtained with the three different background cases shows an expected but 
limited degradation between the ‘Park’ and ‘Open air market’ cases and, in turn, when the SNR is 
reduced by 6dB in the latter case. The consistency of these comparative results is encouraging when 
taking into account that they are obtained with GMM density estimates trained on different draws 
of frames from the training data pool and different random selections of bird/background tests. The 
number of GMM components determined by the MDL criterion (see even numbered rows of Table|2]) 
is also generally consistent across different species and feature vector cases. In the cases of the 4- 
and 6- dimensional models considered here (last two columns of Table|3 the MDL minimum is not 
achieved within the 1 to 15 components range but (as we have verihed in separate experiments) the 
performance does not improve consistently when the number of components is increased up to and 
beyond the optimum MDL point. 

4 Discussion - Conclusion 

The results presented above provide support for the main objective of this piece of research. We 
make use of a source of data (xeno-canto) that covers a very large and constantly increasing pro¬ 
portion of the total number of vocalising avian species but provides less detailed labelling than is 
typically used in most published works on the subject. We train detection models for individual 
species in a way that does not involve any manual preprocessing and is directly scalable to any 
selection of species. The normality models we learn from these data are shown to indeed achieve 
discrimination against non-bird sound background which we make no effort to model. Repetitions of 
the training algorithm over different randomly selected frame collections yield consistent detection 
results. 

On the other hand, the baseline approach presented here underperforms for a number of the species 
(namely Luscinia luscinia, Luscinia megarhynchos and Turdus merula among the species considered 
here). In the case of the best performing ‘mode’ scalar feature vector, the density estimates for these 
species (not presented here) have their main lobe lower in frequency (in the region around l-2kHz) 
and are thus closer to the characteristics of urban sounds and human speech. In recordings where 
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the acoustic background is more dense in other sources of biophony (non-bird aminal vocalisations) 
we would expect the discriminative ability of other species’ density models which are distributed 
higher in the frequency mode feature dimension to also deteriorate. 

The choice to build a normality model based on features obtained from statistics of spectrogram 
frames (rather than, e.g. Mel-Frequency Cepstral Coefficient (MFCC) based features; another 
widely employed choice) was partly motivated by positive results previously presented in similar 
recognition tasks by the use of such features (see e.g. ifTTl fTOlU and equally with the aim of 
keeping the feature vector dimension low in the interest of better convergence and interpretability 
properties of the GMM fitting. However, a basic premise of the approach investigated here, namely 
that a richer (and hence more discriminative) normality model can be obtained by using higher di¬ 
mension feature vectors is not justified by our current results. A direction that we are currently 
investigating aiming to address this shortcoming is the incorporation of time-dependence modelling 
in the learning scheme by fitting a Hidden Markov Model (HMM) to training data obtained in the 
same way as described above. 

Finally it must be noted that, while the focus of this work is to evaluate the discriminative power 
of this specific detection method, the practical use of such a detection method also requires the 
application of a threshold level determining the decision boundary between normal and abnormal 
test inputs (HJ))- A principled probabilistic method for the determination of such a boundary for the 
specific type of Gaussian mixture density models which based on Extreme Value Theory is described 
in M- We are currently investigating the practical effectiveness of threshold determination methods 
for the specific bioacoustic detection problem at hand. 
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